Deciding How to Store Provenance
نویسنده
چکیده
Provenance of a file is metadata pertaining to the history of the file. Provenance, unlike normal metadata stored in file systems, is retrieved primarily by running queries. This implies that provenance has to be indexed and should have a query interface. We believe that databases are the most appropriate place to store provenance as they provide both indexing and query capabilities. The goal of this paper is to explore the most appropriate schema and database technology for storing provenance. In the paper we discuss the different possible schemas for storing provenance and the tradeoffs in choosing each of the schemas. We then characterize the behavior of some of the popular database architectures under provenance recording/querying workloads. The database architectures that we considered are: RDBMS, Schemaless Embedded Databases (Berkeley DB), XML, and LDAP. Finally, we present preliminary performance results for the database architecture for provenance recording and some common provenance queries. Our results indicate that schemaless embedded databases have the best performance under most provenance workloads. The results also indicate that RDBMS has the best space utilization under most provenance workloads.
منابع مشابه
Grouping Provenance Information to Improve Efficiency of Access Control
Provenance is defined in some literature as a complete documentation of process that led to an object. Provenance has been utilized in some contexts, i.e. database systems, file systems and grid systems. Provenance can be represented by a directed acyclic graph (DAG). In this paper we show an access control method to the provenance information that is represented by a directed acyclic graph and...
متن کاملA Demonstration of TripleProv: Tracking and Querying Provenance over Web Data
The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this demonstration, we present TripleProv: a new system extending a native RDF store to efficiently handle the storage, tracking and querying of provenance in RDF data....
متن کاملLocal Clustering in Provenance Graphs (Extended Version)
Systems that capture and store data provenance, the record of how an object has arrived at its current state, accumulate historical metadata over time, forming a large graph. Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningf...
متن کاملAutomatic capture and efficient storage of e-Science experiment provenance
For the First Provenance Challenge, we introduce a layered model to represent workflow provenance that allows navigation from an abstract model of the experiment to instance data collected during a specific experiment run. We outline modest extensions to a commercial workflow engine so it will automatically capture provenance at workflow runtime. We also present an approach to store this proven...
متن کاملProvenance for Nondeterministic Order-Aware Queries
Data transformations that involve (partial) ordering, and consolidate data in presence of uncertainty, are common in the context of various applications. The complexity of such transformations, in addition to the possible presence of meta-data, call for provenance support. We introduce, for the first time, a framework that accounts for the conjunction of these needs. To this end, we enrich the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006